荷兰专利NL1035941A1 Methods of characterizing similarity between measurements on entities, computer program product and

专利PDF首页>>荷兰专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:

公开号:NL1035941A1
申请号:NL1035941
申请日:2008-09-16
公开日:2009-03-20
发明作者:Christianus Gerardus Maria Mol；Maria Elisabeth Reuhman-Huisken
申请人:Asml Netherlands Bv；
IPC主号:

专利说明:

METHODS OF CHARACTERIZING SIMILARITY BETWEEN MEASUREMENTS ON ENTITIES, COMPUTER PROGRAM PRODUCT AND DATA CARRIER
Field
The present invention relates to a method of characterizing the similarity between measurements on entities. The invention also relates to a computer program product and a data carrier including such a computer program product.
Background A lithographic apparatus is a machine that applies a desired pattern onto a substrate, usually onto a target portion of the substrate. A lithographic apparatus can be used, for example, in the manufacture of integrated circuits (ICs). In that instance, a patterning device, which is alternatively referred to as a mask or a reticle, may be used to generate a circuit pattern to be formed on an individual layer of the IC. This pattern can be transferred onto a target portion (e.g., including part of, one, or several dies) on a substrate (e.g., a silicon wafer). Transfer of the pattern is typically via imaging onto a layer of radiation-sensitive material (resist) provided on the substrate. In general, a single substrate will contain a network of adjacent target portions that are successively patterned. Known lithographic apparatus include so-called steppers, in which each target portion is irradiated by exposing an entire pattern onto the target portion at one time, and so-called scanners, in which each target portion is irradiated by scanning the pattern through a radiation beam in a given direction (the "scanning" direction) while synchronously scanning the substrate parallel or anti-parallel to this direction. It is also possible to transfer the pattern from the patterning device to the substrate by imprinting the pattern onto the substrate.
Manufacturing a typical device by a lithographic process typically includes a variety of cycles or a variety of steps. These steps may include coating the substrate with a photosensitive material (or otherwise applying a photosensitive material to one or more surfaces of the substrate), projecting an image on the photosensitive material, developing the photosensitive material and processing the substrate, which can include covering the substrate in a new layer of material. One of the problems that may be encountered with the lithographic process is that successive layers are not accurately imaged on top of each other so that there is a so-called overlay error. In order to avoid following the subsequent steps when an overlay error already exists which would be detrimental to the component's performance, after each cycle the overlay error may be measured. If the overlay error is too large then the most recent layer can be removed and that step repeated before proceeding onto the next step.
In order to reduce overlay error, substrates are generally provided with a variety of reference marks so that the position of the substrate on a substrate table in a projection apparatus may be measured very precisely prior to the exposure operation. In this way it is possible to improve the accuracy of the exposure operation because the relative positions of the substrate, the previously applied patterned layer and the patterning device in the lithographic apparatus may be determined.
Another problem with multi-cycle lithographic processes is the deformation of the substrate which can occur with the application of particular layers and / or particular patterns. Deformation includes, for example, topographic 3-dimensional deformation, deformation of the reference marks (shape or depth) or variation of layer properties or thicknesses deposited on the substrate. Chemical mechanical polishing (CMP) is notorious for causing deformation of the substrate. With the use of substrates with a diameter or 300 mm or more, it is expected that substrate deformation may become an even more important factor. In order to reduce deformation, it may be desirable to keep the processes as uniform as possible over the whole area of the substrate. Deformation of the substrate may lead to errors in the imaging of the substrate resulting in the need for repeat a particular operation. Also, during the development of a process for a particular component manufactured by lithography, the process may be optimized to minimize, or at least keep within limits, the amount of substrate deformation. The reduction of overlay error or an error as a result of substrate deformation, or at least early detection or one or more of such errors, may lead to improved yield.
Small particles present on the surface of the substrate may hamper the lithographic process since at the positions of the particles no proper illumination of the substrate can be achieved. Generally the size of the particles is such that they cannot be detected by the level sensors present in the lithographic apparatus, but they may be detected when more accurate level sensors are employed, for instance in dedicated measurement equipment or the manufacturer of the substrate.
Small particles may also be present between the substrate carrier, i.e. the support table or chuck, and the substrate. These particles may deform the layers before the lithographic process is performed.
The particles cause artifacts, so-called focus spots, arranged in the subsequent layers on the surface of the substrate. When the artifacts are positioned beneath an alignment mark, they may give rise to overlay errors as well. A timely detection of the artifacts, i.e. a detection before the next layer is applied to the substrate, may lead to an improved yield.
One option would be to measure the substrate at least parts of the grids and / or substrate shapes or a number of substrates in separate (offline) measuring instruments (metrology tools) or on-line in the lithographic apparatus itself, just before a pattern is applied to the substrate, and show the results of the measurements in graphical representations. Such a graphical representation typically comprises a huge amount of data. Therefore, the graphical representations can practically not be analyzed and interpreted by a human operator. The interpretation based on graphical representations of the measured substrate grid and shape is rather subjective and time consuming and does not provide a workable situation for defining substrate shape, substrate grid and substrate field shape consistency given the huge amount of measurement data. This hampers process characterization, criteria for transfer to production, and production quality monitoring.
SUMMARY
It is desirable to provide a method for an efficient characterization of the similarity between measurements on a variety of entities.
According to an aspect of the invention, there is provided a method according to clause 1.
According to an aspect of the invention, there is provided a computer program product as clause in clause 8.
According to an aspect of the invention, there is provided a data carrier as clause in clause 9.
LETTER DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:
Figure 1 schematically depicts a lithographic apparatus according to an embodiment of the invention;
Figure 2 schematically depicts the measurement station region of the lithographic apparatus of Figure 1;
Figure 3A is an example of a graphical representation of the mean correlation as a function of the wafer number for the lot, the first support table and the second support table; and
Figure 3B is a graphical representation of the standard deviation (σ) as a function of the wafer number or the example shown in Figure 3A.
Figure 4A is an example of a graphical representation of the mean correlation as a function of the wafer number for the lot, the first support table and the second support table;
Figure 4B is a graphical representation of the standard deviation (σ) as a function of the wafer number;
DETAILED DESCRIPTION
In an embodiment of the invention a lithographic apparatus is used to perform measurements on a substrate. The lithographic apparatus is schematically depicted in Figure 1. The apparatus comprises: an illumination system (illuminator) IL configured to condition a radiation beam B (e.g., UV radiation or EUV radiation). a support structure (e.g., a mask table) MT constructed to support a patterning device (e.g., a mask) MA and connected to a first positioner PM configured to accurately position the patterning device in accordance with certain parameters; a substrate table (e.g., a wafer table) WT constructed to hold a substrate (e.g., a resist-coated wafer) W and connected to a second positioner PW configured to accurately position the substrate in accordance with certain parameters; and a projection system (e.g. a refractive projection lens system) PS configured to project a pattern imparted to the radiation beam B by patterning device MA onto a target portion C (e.g. including one or more dies) of the substrate W.
The illumination system may include various types of optical components, such as refractive, reflective, magnetic, electromagnetic, electrostatic or other types of optical components, or any combination of, for directing, shaping, or controlling radiation.
The support structure supports, i.e. bears the weight of, the patterning device. It holds the patterning device in a manner that depends on the orientation of the patterning device, the design of the lithographic apparatus, and other conditions, such as for example whether or not the patterning device is a hero in a vacuum environment. The support structure can use mechanical, vacuum, electrostatic or other clamping techniques to hold the patterning device. The support structure may be a frame or a table, for example, which may be fixed or movable as required. The support structure may ensure that the patterning device is at a desired position, for example with respect to the projection system. Any use of the terms "reticle" or "mask" may be considered synonymous with the more general term "patterning device."
The term "patterning device" used should be broadly interpreted as referring to any device that can be used to impart a radiation beam with a pattern in its cross-section such as to create a pattern in a target portion of the substrate. It should be noted that the pattern imparted to the radiation beam may not exactly correspond to the desired pattern in the target portion of the substrate, for example if the pattern includes phase-shifting features or so called assist features. Generally, the pattern imparted to the radiation beam will correspond to a particular functional layer in a device being created in the target portion, such as an integrated circuit.
The patterning device may be transmissive or reflective. Examples of patterning devices include masks, programmable mirror arrays, and programmable LCD panels. Masks are well known in lithography, and include mask types such as binary, alternating phase shift, and attenuated phase shift, as well as various hybrid mask types. An example of a programmable mirror array employs a matrix arrangement of small mirrors, each of which can be individually tilted so as to reflect an incoming radiation beam in different directions. The tilted mirrors impart a pattern in a radiation beam which is reflected by the mirror matrix.
The term "projection system" used should be broadly interpreted as encompassing any type of projection system, including refractive, reflective, catadioptric, magnetic, electromagnetic and electrostatic optical systems, or any combination thereof, as appropriate for the exposure radiation being used, or for other factors such as the use of an immersion liquid or the use of a vacuum. Any use of the term "projection lens" may also be considered as synonymous with the more general term "projection system".
As here depicted, the apparatus is of a reflective type (e.g. employing a reflective mask). Alternatively, the apparatus may be a transmissive type (e.g., employing a transmissive mask).
The lithographic apparatus may be of a type having two (dual stage) or more substrate tables (and / or two or more mask tables). In such "multiple stage" machines the additional tables may be used in parallel, or preparatory steps may be carried out on one or more tables while one or more other tables are being used for exposure.
The lithographic apparatus may also be a type of at least a portion of the substrate may be covered by a liquid having a relatively high refractive index, e.g., water, so as to fill a space between the projection system and the substrate. Liquid immersion may also be applied to other spaces in the lithographic apparatus, for example, between the mask and the projection system. Immersion techniques are well known in the art for increasing the numerical aperture of projection systems. The term "immersion" as used does not mean that a structure, such as a substrate, must be submerged in liquid, but rather only means that liquid is located between the projection system and the substrate during exposure.
Referring to figure 1, the illuminator 1L receives a radiation beam from a radiation source SO. The source and the lithographic apparatus may be separate entities, for example when the source is an excimer laser. In such cases, the source is not considered to be part of the lithographic apparatus and the radiation beam is passed from the source SO to the illuminator IL with the aid of a beam delivery system BD including, for example, suitable directing mirrors and / or a beam expander. In other cases the source may be an integral part of the lithographic apparatus, for example when the source is a mercury lamp. The source SO and the illuminator IL, together with the beam delivery system BD if required, may be referred to as a radiation system.
The illuminator IL may include an adjuster AD for adjusting the angular intensity distribution of the radiation beam. Generally, at least the outer and / or inner radial extent (commonly referred to as σ-outer and σ-inner, respectively) or the intensity distribution in a pupil plane or the illuminator can be adjusted. In addition, the illuminator IL may include various other components, such as an integrator IN and a condenser CO. The illuminator may be used to condition the radiation beam, to have a desired uniformity and intensity distribution in its cross-section.
The radiation beam B is an incident on the patterning device (eg, mask MA), which is a hero on the support structure (e.g., mask table MT), and is patterned by the patterning device. Having traversed the mask MA, the radiation beam B passes through the projection system PS, which is the beam onto a target portion C of the substrate W. With the aid of the second positioner PW and position sensor IF2 (eg an interferometric device, linear encoder or capacitive sensor), the substrate table WT can be moved accurately, eg so as to position different target portions C in the path of the radiation beam B. Similarly, the first positioner PM and another position sensor IF1 can be used to accurately position the mask MA with respect to the path of the radiation beam B, eg after mechanical retrieval from a mask library, or during a scan. In general, movement of the mask table MT may be realized with the aid of a long-stroke module (coarse positioning) and a short-stroke module (fine positioning), which form part of the first positioner PM. Similarly, movement of the substrate table WT may be realized using a long-stroke module and a short-stroke module, which form part of the second positioner PW. In the case of a stepper (as opposed to a scanner) the mask table MT may be connected to a short-stroke actuator only, or may be fixed. Mask MA and substrate May be aligned using mask alignment marks M1, M2 and substrate alignment marks P1, P2. Although the substrate alignment marks as illustrated occupy dedicated target portions, they may be located in spaces between target portions (these are known as scribe-lane alignment marks). Similarly, in situations in which more than one that is provided on the mask MA, the mask alignment marks may be located between the dies.
In a typical dual stage lithographic projection apparatus the number of substrate reference marks or alignment marks might be about 25 per substrate W. For reasons of clarity a smaller number of marks is shown in figure 1.
The depicted apparatus could be used in at least one of the following modes: 1. In step mode, the mask table MT and the substrate table WT are kept essentially stationary, while an entire pattern imparted to the radiation beam is projected onto a target portion C at one time (ie a single static exposure). The substrate table WT is then shifted in the X and / or Y direction so that a different target portion can be exposed. In step mode, the maximum size of the exposure field limits the size of the target portion C imaged in a single static exposure. 2. In scan mode, the mask table MT and the substrate table WT are scanned synchronously while a pattern beamed to the radiation beam is projected onto a target portion C (i.e. a single dynamic exposure). The velocity and direction of the substrate table WT relative to the mask table MT may be determined by the (de-) magnification and image reversal characteristics of the projection system PS. In scan mode, the maximum size of the exposure field limits the width (in the non-scanning direction) or the target portion in a single dynamic exposure, whereas the length of the scanning motion has the height (in the scanning direction) of the target portion. 3. In another mode, the mask table MT is kept essentially stationary holding a programmable patterning device, and the substrate table WT is moved or scanned while a pattern is imparted to the radiation beam is projected onto a target portion C. In this mode, generally a pulsed radiation source is employed and the programmable patterning device is updated as required after each movement of the substrate table WT or in between successive radiation pulses during a scan. This mode of operation can be readily applied to maskless lithography that utilizes programmable patterning device, such as a programmable mirror array or a type as referred to above.
Combinations and / or variations on the modes described above or use or entirely different modes or use may also be employed.
Figure 2 gives a further view of the measurement station region or the lithographic apparatus or figure 1. The position sensor IF2 shown in figure 1 measures at a number of measurement points, for instance at the alignment marks (PI, P2) shown in figure 1, the lateral positions (x, y-coordinates) and the height positions (z-coordinate ) of the substrate W placed on the substrate table WT. The substrate table WT is connected to actuators A that may be part of the second positioner PW (not shown in Fig. 2). These actuators are connected to a control device CON with a processor CPU and a memory M. The processor CPU further receives information from lateral position sensors LPS (part of the position sensor IF2) measuring the position of the substrate table WT or substrate table holder by electric (capacitive, inductive) or optical, eg interferometric (as shown in figure 1) devices. The processor also receives input from a level sensor LS which measures the height and / or tilt information from the target area C on the substrate W where the projection beam PB hits the substrate surface. The level sensor LS may be, for example, an optical sensor as described here; alternatively, a pneumatic or capacitive sensor (for example) is conceivable.
The term height as used refers to a direction substantially perpendicular to the surface of the substrate W, i.e. substantially perpendicular to the surface of the substrate W that is to be exposed. The level sensor LS measures the vertical position of one or more very small areas generating height data. The level sensor LS may include a light source LS for producing a light beam B, projection optics (not shown) for projecting the light beam B onto the substrate W, detection optics (not shown) and a sensor or detector D. The detector D generates a height dependent signal, which is fed to the processor. The processor is arranged to process the height information and to construct a measured height map, which may be stored in memory M.
An example of a procedure for measuring height data in a projection apparatus is described in U.S. Pat. Patent No. 5 191 200. This procedure may be performed during exposure (on-the-fly), by measuring the part of the substrate W that is being exposed or is next to being exposed, but the surface of the substrate W may also be measured in advance. This latter approach may also be done on a remote position, for instance in a separate measuring instrument. In the latter case, the results of the level sensor measurements may be stored in the form of a so-called height map or height profile and used during exposure to the position W with respect to the focal plane of the optical elements.
The measurements of a level sensor D results in a height data, including information about the relative heights or specific positions of the substrate W. This may also be referred to as a height map. Based on this height data, a height profile may be computed, for instance by averaging corresponding height data from different parts of the substrate (e.g. height data corresponding to similar relative positions within different target portions C). In case such corresponding height data is not available, the height profile is equal to the height data. Based on height data or a height profile, a leveling profile may be determined, being an indication of an optimal positioning of the substrate W with respect to a projection system PS. Such a leveling profile may be determined by applying a linear fit through (part of) the height data or the height profile, e.g., by performing a least squares fit (three dimensional) through the points that are inside the measured area.
As explained above, accurate leveling may require measuring the shape (z-positions) and topography (x, y-positions) of the substrate, for instance using a level sensor, resulting in a height data or (at least part) of the substrate W, based on which a leveling profile can be determined. Such a leveling profile may represent the optimal position of the substrate W with respect to the projection system PS, taking into account the local shape and height of the substrate W.
The first operation of the method is a measurement operation. The measurement operation involves measuring the lateral positions of each or some of the alignment marks or reference marks (PI, P2) on a substrate W are measured. Alternatively or additionally the measurement operation involves measuring the height positions of the substrate surface. Generally the height positions are measured at measurement points distributed over essentially the entire surface of the substrate. The measurement points may partially or entirely coincide with the alignment marks or reference marks. Generally, however, the measurement points differ from the alignment marks or reference marks, especially when the latter marks are arranged on the grid lines of the substrate.
The measurement operations may be performed in the lithographic projection apparatus, where the position of the reference marks may be measured in any case for substrate W to substrate table WT alignment and leveling measurement, or it may be performed in a separate machine.
During the measurement operation the measuring system measures the relative height positions (z-direction, substantially perpendicular to the surface of the substrate is the pattern applied to) and / or the lateral positions (x, y-directions, substantially perpendicular to the z-direction and to each other) or the entity to be measured. The entity may be the substrate itself or a part, such as one or more specific substrate fields or a substrate. The measurements of the positions are performed at a variety of measurement points, in this example the reference marks on the substrate W.
The above-mentioned measurement operation or the relevant positions or reference marks is repeated for all substrates or a given lot, ie substrates that are subject to the same operations in the projection device, if it is intended to characterize the similarity or quality of the substrates in a lot. If, on the other hand, it is intended to characterize the substrates or a given support table, ie the substrates that have been processed or will be processed by the same support table (also referred to as the wafer table or chuck) or the projection device, the operation of measuring the positions (x, y, z) at the measurement points is repeated for all substrates assigned to the specific support table.
More generally, a set of substrates is measured, which allows the characterization of the substrates in the substrate set or in a subset thereof. In practice the measurement may often be performed for a lot, so that it is possible determine the quality of the substrates or the lot (set) or the quality of a subset of the lot, for instance the quality of substrates handled by one of the chucks of the lithographic device.
It is also possible to perform the characterization for the various fields on the same substrate. In this case the operation of measuring the positions at the measuring points is repeated for a variety of the fields of the substrate.
Although in the above-mentioned measurement operation (relative) positional information of the substrate (s) is measured, the operation may involve measurement of any child or data related to the substrate in general and / or to the particular layer or concern, as well as statistical measures. For example, the information may include raw position data; raw sensor data indicative of the substrate markers; and / or calculations from the data, such as magnification, translation, rotation or differences of individual measurements with respect to a reference grid described by parameters.
According to the embodiment, positions are modeled as
where subscripts indicate a dependency or either substrate (i) or measurement point (j) or both, p is the position and is f is a function of the process for producing the measured substrate. This use of subscripts will be continued throughout this document.
The measurements are modeled as
where: - w is the measured value; and - ε is random process which is identically distributed for the different substrates and which is stationary, ie which is stable over the different measurement points (j) · The random process may be referred to as measurement noise and for example comprises measurement-device related noise.
The production function for producing the substrate is modeled to be
- h is a selected model function which only depends on the measurement points (j); and - g is a rest term which need not be selected as will be clarified later.
In stead of being a selected function model, the function h alternatively represents a function including convenient parameters which can be controlled for future process steps to be applied to the substrate such as a magnification to use.
The rest term (g) should be seen as model the process as far as not described by the model function (h). In case the production function (f) is wrongly modeled and, in reality, the production function (f) cannot be split up into a superposition or a model function (h) (which does not depend on the substrate (i)) and a rest term which depends on both the substrate and the measurement point, the rest term is only defined by rewriting the above model as
In the embodiment a presumption is made. It is presumed that the process was identical for all substrates, so that the rest term (g) does not depend on the substrate as well, so that gi.j is further expressed as gj.
It will be clear to the skilled person that the model for the measurements now no longer allows for a deterministic component which depends on the substrate as both the model function (s) and the rest term (s) do not depend on the substrate (i) ) and the measurements are now modeled as
This will be counter intuitive to the skilled person, as the method is used to detect inconsistencies between the substrates (i).
Then parameters of the selected function model (s) are estimated by fitting the function to the measurements for instance by applying the least squares criterion as this is a maximum likelihood estimator. Other criteria may be used as will be clear to the skilled person. Parameters of the rest term (g) are not fitted, as no known function to describe the rest term (g) is known and therefore no parameters or such a function are known. This conforms to neglecting the rest term (g) while fitting the model function (h). Thus in the fit the measurements are described as in and in β stands for a parameter set or model function (h). The parameter set (β) is modeled to be constant throughout the various of substrates (i) and the measurement points (j) ·
Thus Zi.j is a residue including a deterministic component which depends on the measurement point (j) as well as a random, stable process which depends on the substrate and the measurement point and which is superposed on the deterministic component.
It will be clear to the skilled person that, although the rest of the term (g) was neglected, the expectation of the residual Zi.j is the expectation of the random, stable process, superposed on the deterministic component which depends on the measurement point ( j).
In the embodiment, then, for the first entity (i.e., substrate in this embodiment), the substrate average residue is estimated by the expression
if the same notation is used as in the earlier parts of this document and indicates the number of measurement points. The bar over the symbol for the residue indicates that it concerns an average.
It is expected that for the second entity (i.e., substrate in this embodiment), the average value of the residues is equal to the corresponding average value for the first entity. This is because it was presumed that the entities were produced and measured in an identical way so that the rest term (g) does not depend on the entity (i). Therefore it is expected that for the second entity, the average value of the earlier neglected rest term (g) is equal to the corresponding average value for the first entity.
Furthermore to see that the average values are expected to be equal, the expression for the average values is rewriting according to the expression
Also it is realized that Cij is a stationary identically distributed process so that with a larger number of measurement points (s) the expectations of the sums for different entities (i) and measurement points (j) are equal for the first and second substrate.
Anyhow, the average value of residues, is estimated for the second entity as well.
In a next step of the embodiment the correlation coefficient between a first substrate (i '= 1) and a second substrate (i "= 2) is estimated according to the expression
where ri, 2 is the estimator for the correlation coefficient.
Note that, as modeled, the expectation of the deterministic part of the residue (Zi.j) depends on the measurement point (j) and not on the substrate (i) because it was presumed that the process was identical for all substrates (i ). In the estimator for the correlation coefficient, however, the average of the residue for a substrate is estimated per substrate and not per measurement point.
The skilled man will appreciate that the expectation for the stochastic component is zero for all terms within () in the formula for the estimator for the correlation coefficient.
Thus the expectation of the residue (¾) is expected to differ from the estimated substrate average of the residue, so that the expectation of the numerator or the estimator for the correlation coefficient (ruO comprises a deterministic component. For example, gi = l and g2 = 2 so that the numerator or the estimator for the correlation coefficient is expected to be 2 * (1-1.5) (2-1.5) = -0.5.
This approach is advantageous however if the variations in the deterministic component between the entities is relatively large or said otherwise when the differences of the values of the rest term (s) are relatively large both with respect to the stochastic component (ε) in the residue (Z), or again said otherwise when the model was relatively poor.
The result is that with a poor model but a valid presumption (ie the process was constant), the estimated correlation coefficient will be high because the deterministic part of the numerator is much larger than the stochastic part of the numerator and because those deterministic parts are equal for entity i'and i ".
In case the process changed between the first substrate (i ') and the second substrate (i "), the fitted model will fit less good to the measurements (we) and the deterministic parts of (Zr) and (Zr) will differ. Thus the deterministic part of the estimated correlation coefficient will be low In that case the stochastic part of the estimated correlation coefficient will become more important It can be imagined that for highly uncorrelated noise, the estimated correlation coefficient may still be dominated by deterministic part.
If the correlation between the stochastic parts, i.e. between the realizations of the random process (Eij) is low, the stochastic part of the estimated correlation between the residues will be low. The lower the correlation between the stochastic parts, the higher the sensitivity for the validity of the presumption. Also, the larger the misfit, the larger the deterministic part of the residue and the higher the correlation in case there is a valid presumption. Said differently, by reducing the number of estimated parameters when fitting the model function (h), ie by reducing the quality of the model, the better the estimator for the correlation coefficient can be used to characterize the substrates (or the measurements on the substrates ).
This is advantageous, as the lower the number of estimated parameters, the easier it is to monitor them and to assess them. It will also be clear to the person skilled in the art, that the model function (s) may represent a model convenient to the application having parameters which may be varied during further processing steps of the substrates but having nothing to do with the earlier production process of the substrate or the measurement process as the method works better if the effects of mis-modeling become more dominant relative to the noise levels.
In case the correlation coefficient is to low, for instance below 0.95 or 0.9 or 0.85 for all substrates, the model may be adjusted by fitting even fewer parameters in order to increase the deterministic component of the residue.
The influence on the estimated correlation coefficient is used to verify if the presumption that the process was identical for all substrates was correct, which will be further clarified below.
Without this further clarification, it will be clear to the person skilled in the art, that in an analogue way errors such as a particle present between the second substrate and a substrate table used to support the substrate and deforming the second substrate but not present between the first substrate and the substrate table also has an influence in the estimated correlation coefficient. Similarly the presumption may be found to be incorrect if both substrates are or have been supported by different substrate tables both present in the lithographic apparatus. In another similar fashion, the presumption may be found to be incorrect because of layer-to-layer interaction differences between the substrates.
In the embodiment, the measurements (we) are measurements of the difference between measured position data and the corresponding expected position data. Thus each measurement conforms to a position measurement or a feature or substrate or the multiple of substrates.
The measurements (wi.j) are modeled as
where jSo, βιχ ;, βιμ are members of the parameter set (β) or model function (h) and are linear coefficients. This embodiment conforms to a measured position data is composed of fine wafer align (FIWA) data (ie, representative of the lateral positions) tasks at various measurement points across the substrate and z-map data (representative of the height positions) , which comprises one average height position per substrate field, ie, the average of all measured heights available in the respective substrate field.
In case the correlation coefficients are found to be too low, the sensitivity of the correlation coefficient may be raised by discarding or one model term, such as jSiA / and refitting the model function (h). This will raise the sensitivity of the correlation coefficient as the deterministic part of the residues is expected to increase.
In another embodiment the measured position data is composed of z-map data (representative of the height positions) tasks at several measurement points per substrate field and another regression model may be used:
are similar symbols are used for similar model terms and thus j3o, jSi, / ¾, jS.), β-ι, βι are members of the parameter set (β) or model function (h) and are linear coefficients.
More generally the method as described in this document is applied to a variety of entities such as all substrates in a batch or a lot and for an instance each entity corresponding to several position measurements.
Once the correlation coefficient (ru) between the residues has been estimated, the estimated correlation coefficient is compared to a threshold amount to determine if the presumption that the process was identical for all substrates was correct. If the presumption is not correct, there is a dissimilarity between the entities. In case of more than two sets of residues, two or more correlation coefficients may be estimated and compared to the threshold amount in order to determine the extent of a data set is similar to each of the other data sets.
In a further embodiment of the invention, for each entity of the multiple of entities the correlation coefficients with respect to every other entity or the multiple of entities is estimated. Averaging these correlation coefficients for the entity set gives a measure for the validity of the presumption that the process was identical for all entities within the multiple of entities. An example of an expression for the estimation of the average correlation coefficient for the multiple of entities entity set is given by:
with f r the average correlation coefficient. The estimated average correlation coefficient of the entity set is compared to a threshold amount. If the estimated correlation coefficient is larger than the threshold value, it is determined that the presumption was valid and that the various entities in the entity have a large consistency, whereas it is determined that the presumption was not valid and that the entities lack sufficient consistency when the average correlation coefficient remains lower than the threshold value. In the latter case the entities are rejected. Alternative more analyzes may be performed for instance to assess if the entities are sufficiently similar.
As mentioned earlier, in case the entities are substrates to be processed by the projection device and the entity set comprises substrates from a lot that are to be subjected to the same operations in the projection device, the above procedure provides a measure for the lot consistency . The lot consistency may be lowered by the use of different support tables. In case the entities are substrates to be processed by a projection device and the entity set comprises substrates to be carried by the same support table of the projection device, the procedure provides a measure for the support table consistency (also referred to as wafer table consistency or the chuck consistency). In case the entities are fields of a substrate to be processed by a projection device and the entity set comprises the fields or said at least one substrate, the procedure provides for a measure of the field consistency of the specific substrate. The entities may also be formed by different layers of a substrate. In this case the procedure provides a measure for inter-layer consistency.
In a further embodiment an average correlation coefficient per entity is estimated. For instance, if the multiple of entities comprises of a variety of substrates of the same product lot (typically 25 wafers), then for each substrate (wafer) the correlation coefficient with respect to each of the other substrates (wafers) or the lot is estimated. The estimator for the average correlation coefficient w / or wafer (i) is given by:
follow the notation used throughout this document. The estimated average correlation coefficients or the respective entities are then compared with one or more threshold values. When the values of one or more of the estimated correlation coefficients are lower than the threshold value, it is determined that the presumption is not valid and that the multiple of entities lacks consistency.
In another embodiment the comparison with one or more threshold values has been replaced by a comparison with each other. In this case, after having performed the above calculations, the average correlation coefficients of are compared with each other in order to determine the extent of consistency within the multiple of entities. For instance, visualizing the estimated average correlation coefficients for all entities in the entity set, in this example plotting the estimated average correlation coefficients or each wafer in the lot, in a graphical representation makes it easier for an operator to visually pinpoint an entity or group or entities (for instance a wafer or group of wafers having a deviating wafer grid (x, y), field shape (z) and / or wafer shape (z)) responsible for bad lot consistency. Also in this case an additional comparison with a threshold value may be performed. The deviations of the respective estimated average correlation coefficients per entity from a threshold value, which is preferably derived from the overall mean value of the averaged correlation coefficients per entity, may be used as an indication of the validity of the presumption and the consistency in the multiple or entities.
In an embodiment of the invention, the measurements are modeled as
We = p'y + £ i, jj p 'models both the positions (p) or features on the entities (i) as well as influence of the measurement device such as drift. Again the model comprises a stable random process (ε). The remainder of the method of the invention has been applied in a similar way. In the embodiment, the estimate for the correlation coefficient is used to verify if the measurement process has changed. For instance the measurements may have drifted because of an instability of the measurement device or the entities may have been substrates that were supported on a substrate table and for some substrates there was contamination present between the substrate and the substrate table. Alternatively, some substrates were supported on a first substrate table during the measurements and the remainder of the substrates in the various or substrate was supported on a second substrate table supporting the substrates in a different way during the measurements.
In a further embodiment, the parameter set (β) or the model function (h) is modeled to depend on the entity. Thus each substrate containing different measurement points, will be fitted with it's own parameters (β0), the deterministic part of the residue (S) is modeled to be the same for the entities. constant fingerprint for the different entities In the fingerprint, the rest term (gj) is modeled to differ from zero in at least one measurement point (and the same for all entities) Typically for all measurement points, the rest term (g) j has a different value The further process steps are analogue to what has been explained earlier Again the fitted model is subtracted from the measurements giving the residues which are used to estimate the correlation coefficient In this edition as well, the expectation of the deterministic part of the residue (Z) typically differs between different measurement points (j).
As in the earlier, in the estimator for the correlation coefficient, the average of the residue for a substrate is estimated per substrate and not per measurement point. Again, as in the earlier embodiment, the expectation for the stochastic component is zero for all terms within () in the formula for the estimator for the correlation coefficient. Therefore again the estimated correlation coefficient will be dominated by the deterministic part of the residue and the estimated correlation coefficient can be used to characterize the similarity between the measurements.
In the following two examples are given of the characterization of the consistency between a variety of entities, in this case a variety of wafers in a lot or 25 wafers. A first part of the wafers of the lot is measured while being seated on a first support table (chuck), while the remaining wafers are measured when they are seated in the second support table or a lithographic device as described. Figure 3A shows a graphical representation of the estimated average correlation coefficient n (with i = l -25) (vertical axis) as a function of the wafer number (i) (horizontal axis). A first set of wafers is handled by a first support table (chuck 1). This first set is indicated by diamond shaped labels. A second set of wafers is handled by a second support table (chuck 2). The second set is indicated by a triangular shaped label. Clearly the wafers handled by the second support table (chuck 2; diamonds), i.e. wafers 1,3,5,7 ...... show a good correlation, with estimated average correlation coefficients or more than 0.95. However, the estimated average correlation coefficients of the wafers handled by the first support table (chuck 1; triangles), ie wafers 2,4,6, ..., 24) show a dip at a defective wafer 8 .. The estimated average correlation coefficient drops from a value of about 0.95 to about 0.6. The same effect is present in the curve associated with the estimated average correlation coefficients or all wafers (i.e. the wafers of the lot) which is indicated here by dots (lot). When for instance the threshold value is chosen at the level of 0.90, then from the measurement results it can be easily derived that only wafer 8 cannot be considered consistent with the rest of the wafers.
The estimation of the correlation coefficients and the comparison of the correlation coefficients with a threshold value is as such sufficient to characterize the consistency of the set. However, from this information alone it is difficult and sometimes equally impossible to determine the cause of any lack of consistency.
In an embodiment the standard deviation (field repeatability) is estimated as well, to help determine the cause of the inconsistency or wafer 8. In the example the defect or the defective wafer 8 is caused by a large focus spot as a result of the presence or a small particle between the wafer and the support table. The small particle causes an artifact in one field or the wafer only. Figure 3B shows the estimated standard deviation or the measured position data per wafer. The artifact is directly reflected in the rise of the estimated standard deviation or the defective wafer 8, as shown in figure 3B. Figure 3B shows the standard deviation in nanometers (vertical axis) plotted against the wafer number (horizontal axis).
The standard deviation is estimated as
In figures 4A and 4B similar graphical representations of the estimated average correlation coefficient per wafer and the estimated standard deviation per wafer for another example are given. In figure 4A the estimated average correlation coefficient (vertical axis) is plotted against the wafer number (horizontal axis). Again the wafers handled by the second substrate table (chuck 2; diamonds) show a good correlation with values of the estimated average correlation coefficients or more than 0.99. However, the estimated average correlation coefficients for the wafers or the first chuck (triangles), and of course of the lot (dots), show a dip at the fourth wafer. The estimated average correlation coefficient of the fourth wafer is clearly below the threshold value of 0.90 and hence the fourth wafer 4 is not consistent with the other wafers of the lot. However, the curve of the estimated standard deviations shown in figure 4B, which shows the standard deviation (vertical axis) plotted against the wafer number (horizontal axis), remains more or less flat, indicating that the inconsistency of the fourth wafer may have been caused by another phenomenon than the phenomenon that caused the defect in the example of figures 3 A and 3B.
In the present example, the wafer defect was caused by a defective layer structure (layer stack). Since the defect was not restricted to a small area but instead was present about at least a large part of the wafer surface, the estimated standard deviation or the measured position data of the fourth wafer proved to be in the same order of magnitude as the estimated standard deviations belong to the other wafers of the lot. Therefore in this example the estimated standard devision of the position data (figure 4B) does not have a similar rise as was the case in the first example (figure 3b).
In a further embodiment, the measurements are modeled to include a stable random process and a deterministic process. The deterministic process is expanded in a Tailor expansion having derivatives to the measurement points (position co-ordinates) and entities for which a mathematical formula need not be known. The derivatives to the measurement points (position co-ordinates) include derivatives to two out of three orthogonal directions. Alternatively, the derivatives include derivatives to all three orthogonal directions. A presumption is made in that the entities were produced and measured in an identical way (setting derivative terms to the entity (i) to zero). Then a limited number of linear coefficients related to the measurement point (j) as variable is estimated by fitting the Tailor-expansion terms corresponding to the linear coefficients to the measurements (we). More specifically, a Tailor-expansion with position coordinates as variables made. This corresponds to neglecting Tailor-expansion terms related to the measurement point (j) as far as the corresponding linear coefficient is not included in the limited number of linear coefficients when estimating the linear coefficients for the limited number of linear coefficients.
Limiting the number of linear coefficients is advantageous, since less parameters to characterize possibly fast amounts or data are easier to monitor, to store and to handle than larger amounts of data.
The limited number of linear coefficients is estimated with a maximum likelihood estimator such as an estimator resulting from the least squares criterion. Then the residuals (¾) (which include the neglected, non-fitted Tailor-expansion terms related to the measurement point (j), in other words the position variable, and the stable random process £ i, j) are determined for a first entity and a second entity in the various of entities.
Then, for the first entity, the average value of the residues related to the first and second entities is estimated by the expression
the same notation is used as in the earlier parts of this document.
As it was presumed that the entities were produced and measured in an identical way, the derivative terms to the entity (i) were set to zero. Therefore it is expected that for the second entity, the average value of the earlier neglected Tailor-expansion terms related to the measurement point (j) is equal to the corresponding average value for the first entity. Also, the expected values of the random variables are equal for both entities, Anyhow, the average value of the residues is estimated for the second entity as well.
The residuals and both estimated average values for the first and second entities are used to estimate a correlation coefficient (0.2) between the first entity and the second entity according to
same again is used as earlier in this document.
The estimated correlation coefficient is used to determine the similarity between the measurements of the entities. The similarity may either be found to be satisfactorily or unsatisfactorily for instance because of process changes during the production of the entities or changes to the measurement process.
It will be clear to the person skilled in the art, the amount of calculation may be reduced by computing the residues only for a number of entities for the first and second entity in a larger number of entities. Additionally or alternatively, the residues for the first and second entity may be computed for only a part of the measurement points.
Although specific reference may be made in this text to the use of lithographic apparatus in the manufacture of ICs, it should be understood that the lithographic apparatus described may have other applications, such as the manufacture of integrated optical systems, guidance and detection patterns for magnetic domain memories, flat-panel displays, liquid-crystal displays (LCDs), thin-film magnetic heads, etc. The skilled artisan will appreciate that, in the context of such alternative applications, any use of the terms "wafer" or " those "may be considered as synonymous with the more general terms" substrate "or" target portion ", respectively. The substrate referred to may be processed, before or after exposure, in for example a track (a tool that typically applies to a layer of resist to a substrate and develops the exposed resist), a metrology tool and / or an inspection tool. Where applicable, the disclosure may be applied to such and other substrate processing tools. Further, the substrate may be processed more than once, for example in order to create a multi-layer IC, so the term substrate used may also refer to a substrate that already contains multiple processed layers.
Although specific reference may have been made above to the use of the invention in the context of optical lithography, it will be appreciated that the invention may be used in other applications, for example imprint lithography, and where the context allows, is not limited to optical lithography. In imprint lithography a topography in a patterning device the pattern created on a substrate. The topography of the patterning device may be pressed into a layer or resist supplied to the substrate whereupon the resist is cured by applying electromagnetic radiation, heat, pressure or a combination thereof. The patterning device is moved out of the resist leaving a pattern in it after the resist is cured.
The terms "radiation" and "beam" used include and compass all types of electromagnetic radiation, including ultraviolet (UV) radiation (eg having a wavelength of or about 365, 355, 248, 193, 157 or 126 nm) and extreme ultra-violet (EUV) radiation (eg having a wavelength in the range of 5-20 nm), as well as particle beams, such as ion beams or electron beams.
The term "lens", where the context allows, may refer to any one or combination of various types of optical components, including refractive, reflective, magnetic, electromagnetic and electrostatic optical components.
The term "projection system" used should be broadly interpreted as encompassing various types of projection system, including refractive optical systems, reflective optical systems, and catadioptric optical systems, as appropriate for example for the exposure radiation being used, or for other factors such as the use of an immersion fluid or the use of a vacuum. Any use of the term "lens" may be considered as synonymous with the more general term "projection system."
While specific expired or the invention have been described above, it will be appreciated that the invention may be practiced otherwise than as described. For example, the invention may take the form of a computer program containing one or more sequences of machine-readable instructions describing a method as disclosed above, or a data storage medium (eg semiconductor memory, magnetic or optical disk) having such a computer program stored therein.
The descriptions above are intended to be illustrative, not limiting. Thus, it will be apparent to one skilled in the art that modifications may be made to the invention as described without departing from the scope or the clauses set out below.
Other aspects of the invention are set out as in the following numbered clauses: 1. Method for characterizing the similarity between measurements on a variety of entities (i) including a first entity (i ') and a second entity (i "), the method comprising: - receiving measurements (wy) tasks at a variety of measurement points (j) per entity (i), - defining a model including a stochastic process and a model function (s), the model function (h) having values which depend on a set of parameters (β) and the measurement points (j); - estimating the set of parameters (β) by fitting the model function to the measurements (w., j); - determining residual data (They) at the multiple of measurement points (j) for the first entity (i ') and the second entity (i ") by subtracting the fitted model function from the measurements; - estimating a correlation coefficient (rr.r) for the first entity (Γ) and the second entity (i ") based on the determined residual data (They); - using the estimated correlation coefficient (rr.i) to characterize the similarity characterized by defining the model such that the residual data is expected to have a deterministic component which depends on the measurement points and that dominates the estimate of the correlation coefficient and estimating the correlation coefficient (rr.r) using an estimate for the entity average residue averaged over the measurement points of the first entity (Γ) and using an estimate for the entity average residue averaged over the measurement points of the second entity (i "). 2 Method of clause 1, the set of parameters (β) comprises parameters which depend on the entities. 3. Method of any of the preceding clauses, the entities are substrates, substrate layers or substrate fields. 4. Method of any of the preceding clauses, corresponding to the measurements correspond to position measurements or features or an entity or the multiple of entities. 5. Method of any of the preceding clauses, including - comparing the estimated correlation coefficient to a threshold amount to determine the extent of similarity between the entities. 6. Method of any of the preceding clauses, including estimating at least one further correlation coefficient between the first entity (Γ) and a third entity (i '") based on the residual data (Zi.j); - estimating an average set correlation coefficient based on the estimated correlation coefficient and the estimated further correlation coefficient and using the estimated average correlation coefficient to characterize the similarity 7. Method of any of the clauses 1 to 5, including - estimating the correlation coefficients between the first entity (i) and each other entity in the multiple of entities; - estimating an average entity correlation coefficient for the first entity (i); - estimating the correlations coefficients between a further entity (i "") and each other entity in the multiple of entities; - estimating a further average entity correlation coefficient for the further entity (i ""); - comparing the estimated average entity correlation coefficient and the estimated fu rther average entity correlation coefficient to characterize similarity. 8. Computer program product including instructions and data to allow a processor to run a predetermined program in accordance with any one of the methods as clause in clause 1-7. 9. Data carrier including a computer program product according to clause 8.

权利要求:
Claims (1)
[1]
Method for characterizing the resemblance between observations to a number of entities comprising a first entity (i ') and a second entity (i'), the method comprising: - receiving observations (we) at a number of measurement points (j ) per entity (i); wherein the model function (h) has values that depend on a set or parameters (β) and the measurement points (j); - estimating the set of parameters (β) by fitting the model function to the observations (wi, j); - determining residues (Zij) for the number of measurement points (j) for the first entity (i ') and the second entity (i ") by subtracting the fit model function from the observations; - estimating a correlation coefficient (n, i) for the first entity (i ') and the second entity (i') based on the determined residues (They), - using the estimated correlation coefficient (γγ, γ) to determine the similarity between the observations characterized by a deterministic component which depends on the measurement points and which is the estimate of an estimate for an average over the measurement points of the residue of the first entity (i ') and an estimate for an average over the measurement points of the residue of the second entity (i ').

类似技术:

公开号 | 公开日 | 专利标题

US7410735B2|2008-08-12|Method of characterization, method of characterizing a process operation, and device manufacturing method

US7619207B2|2009-11-17|Lithographic apparatus and device manufacturing method

EP1744217B1|2012-03-14|Method of selecting a grid model for correcting grid deformations in a lithographic apparatus and lithographic assembly using the same

KR100801272B1|2008-02-04|Method of characterization, method of characterizing a process operation, and device manufacturing method

NL1035941A1|2009-03-20|Methods of characterizing similarity between measurements on entities, computer program product and data carrier.

KR102219780B1|2021-02-25|Lithographic apparatus with data processing apparatus

JP2007013192A|2007-01-18|Measuring method and calibration substrate

KR101664962B1|2016-10-11|A method to determine the usefulness of alignment marks to correct overlay, and a combination of a lithographic apparatus and an overlay measurement system

JP5813614B2|2015-11-17|Alignment mark deformation estimation method, substrate position prediction method, alignment system, and lithography apparatus

JP2007110130A|2007-04-26|Method and arrangement for predicting thermally-induced deformations of substrate, and semiconductor device

TWI691813B|2020-04-21|A method for measuring derived data and computer program product

CN108027572B|2020-09-18|Method for controlling a lithographic apparatus, lithographic apparatus and device manufacturing method

WO2014032833A1|2014-03-06|Deformation pattern recognition method, pattern transferring method, processing device monitoring method, and lithographic apparatus

WO2015110210A1|2015-07-30|Apparatus operable to perform a measurement operation on a substrate, lithographic apparatus, and method of performing a measurement operation on a substrate

US7889318B2|2011-02-15|Methods of characterizing similarity between measurements on entities, computer programs product and data carrier

CN109804316B|2021-08-24|Method for determining a height profile, measuring system and computer-readable medium

US20200152527A1|2020-05-14|Clearing out method, revealing device, lithographic apparatus, and device manufacturing method

EP1477857A1|2004-11-17|Method of characterising a process step and device manufacturing method

US20210116825A1|2021-04-22|Method to Obtain a Height Map of a Substrate Having Alignment Marks, Substrate Alignment Measuring Apparatus and Lithographic Apparatus

NL2021450A|2018-09-06|Patterning device, manufacturing method for a patterning device, system for patterning a reticle, calibration method of an inspection tool, and lithographic apparatus

同族专利:

公开号 | 公开日

US7916275B2|2011-03-29|

US20090073403A1|2009-03-19|

JP2009135425A|2009-06-18|

JP4837010B2|2011-12-14|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

NL9100410A|1991-03-07|1992-10-01|Asm Lithography Bv|IMAGE DEVICE EQUIPPED WITH A FOCUS ERROR AND / OR TILT DETECTION DEVICE.|

JP3492226B2|1999-02-03|2004-02-03|株式会社日立製作所|Method of narrowing down the cause of semiconductor failure|

JP2001237177A|1999-12-14|2001-08-31|Nikon Corp|Position sensing method, position sensor, exposure method, aligner, recording medium and manufacturing method of device|

WO2002029870A1|2000-10-05|2002-04-11|Nikon Corporation|Method of determining exposure conditions, exposure method, device producing method and recording medium|

JP4170611B2|2001-03-29|2008-10-22|株式会社東芝|Defect detection method and defect detection apparatus for semiconductor integrated circuit|

JP3553029B2|2001-05-24|2004-08-11|プロモステクノロジーズインコーポレイテッド|Analysis method and apparatus in wafer manufacturing process|

TWI451475B|2004-08-19|2014-09-01|尼康股份有限公司|An alignment information display method and a recording device having a program, an alignment method, an exposure method, a component manufacturing method, a display system, a display device, a measurement device, and a measurement method|

JP4771753B2|2005-06-08|2011-09-14|新光電気工業株式会社|Surface light source control apparatus and surface light source control method|US7889318B2|2007-09-19|2011-02-15|Asml Netherlands B.V.|Methods of characterizing similarity between measurements on entities, computer programs product and data carrier|

NL2005464A|2009-12-17|2011-06-21|Asml Netherlands Bv|A method of detecting a particle and a lithographic apparatus.|

US9287127B2|2014-02-17|2016-03-15|Taiwan Semiconductor Manufacturing Co., Ltd.|Wafer back-side polishing system and method for integrated circuit device manufacturing processes|

CN107579036B|2016-07-04|2020-08-11|中芯国际集成电路制造有限公司|Semiconductor device and method for manufacturing the same|

CN106645593B|2017-01-24|2019-07-12|浙江农林大学|Harmful influence poisons reagent leak hunting method|

CN106970180B|2017-01-24|2019-06-25|浙江农林大学|Poison reagent leakage monitoring method|

US10115687B2|2017-02-03|2018-10-30|Applied Materials, Inc.|Method of pattern placement correction|

EP3364247A1|2017-02-17|2018-08-22|ASML Netherlands B.V.|Methods & apparatus for monitoring a lithographic manufacturing process|

US10585360B2|2017-08-25|2020-03-10|Applied Materials, Inc.|Exposure system alignment and calibration method|

法律状态:
2009-05-06| AD1A| A request for search or an international type search has been filed|

优先权:

申请号 | 申请日 | 专利标题

US90218607|2007-09-19|

US11/902,186|US7916275B2|2007-09-19|2007-09-19|Methods of characterizing similarity or consistency in a set of entities|

[返回顶部]